Install and load awst

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("drisso/awst")

Data import and cleaning

The collection of the SEQC datasets is available throught the seqc Bioconductor package. It can be installed with the following.

BiocManager::install("seqc")

We next build the data matrix from the “ILM_aceview” experiments. We remove duplicate gene symbols, ERCC spike-ins, and genes with no ENTREZ ID.

Distribution of samples per sites
AGR BGI CNL COH MAY NVS Sum
A 4 5 5 4 5 4 27
B 4 5 5 4 5 4 27
C 4 5 5 4 5 4 27
D 4 5 5 4 5 4 27
Sum 16 20 20 16 20 16 108

Figure 2

Figure 2a: clustering on RSEM data after awst

## quartz_off_screen 
##                 2

Figure 2b: clustering on CPM data after awst

## quartz_off_screen 
##                 2

Figure 2c: clustering on CPM data (std values)

Figure 2d: clustering on CPM data (std values; top 100 genes)

Figure 2e: clustering on TPM data (std values)

Figure 2f: clustering on TPM data (std values; top 100 genes)

Figure 2g: clustering on TPM data after awst

Figure 2h: clustering on TPM data after Hart transformation

Figure of silhouttes (given the estimated partition)

1: A (UHRR), 2: B (HBRR), 3: C (0.75A+0.25B), 4: D (0.25A+0.75B)

RSEM data with AWST TPM data with gene-wise standardization
CPM data with AWST TPM data with gene-wise standardization, top 100 genes
CPM data with gene-wise standardization TPM data with AWST
CPM data with gene-wise standardization, top 100 genes TPM data with Hart transformation

Figure 3

3 perturbed samples

Figure 3a: clustering on CPM data (std values; top 2500 genes)

Figure 3e: clustering on RSEM data after awst

## quartz_off_screen 
##                 2
## quartz_off_screen 
##                 2

Two-third perturbed samples

Figure 3c: clustering on CPM data (std values; top 2500 genes)

Figure 3g: clustering on RSEM data after awst

All perturbed samples

Figure 3d: clustering on CPM data (std values; top 2500 genes)

Figure 3h: clustering on RSEM data after awst

Figure of silhouttes (given the theoretical partition)

1: A (UHRR), 2: B (HBRR), 3: C (0.75A+0.25B), 4: D (0.25A+0.75B)

CPM data with gene-wise standardization, top 2,500 genes RSEM data with AWST
One third perturbed samples. CPM data top 2,500 genes One third perturbed samples. RSEM data with AWST
Two thirds perturbed samples. CPM data top 2,500 genes Two thirds perturbed samples. RSEM data with AWST
All perturbed samples. CPM data top 2,500 genes All perturbed samples. RSEM data with AWST

Figure of principal components (colored by theoretical partition)

1: A (UHRR), 2: B (HBRR), 3: C (0.75A+0.25B), 4: D (0.25A+0.75B)

CPM data with gene-wise standardization, top 2,500 genes RSEM data with AWST
One third perturbed samples. CPM data top 2,500 genes One third perturbed samples. RSEM data with AWST
Two thirds perturbed samples. CPM data top 2,500 genes Two thirds perturbed samples. RSEM data with AWST
All perturbed samples. CPM data top 2,500 genes All perturbed samples. RSEM data with AWST

Session info

## R version 3.6.0 (2019-04-26)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.5
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] awst_0.0.3        dendextend_1.12.0 cluster_2.1.0     knitr_1.25       
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.2        pillar_1.4.2      compiler_3.6.0   
##  [4] highr_0.8         viridis_0.5.1     tools_3.6.0      
##  [7] digest_0.6.21     evaluate_0.14     tibble_2.1.3     
## [10] gtable_0.3.0      viridisLite_0.3.0 pkgconfig_2.0.3  
## [13] rlang_0.4.0       yaml_2.2.0        xfun_0.10        
## [16] gridExtra_2.3     stringr_1.4.0     dplyr_0.8.3      
## [19] grid_3.6.0        tidyselect_0.2.5  glue_1.3.1       
## [22] R6_2.4.0          rmarkdown_1.16    ggplot2_3.2.1    
## [25] purrr_0.3.2       magrittr_1.5      scales_1.0.0     
## [28] htmltools_0.3.6   assertthat_0.2.1  colorspace_1.4-1 
## [31] stringi_1.4.3     lazyeval_0.2.2    munsell_0.5.0    
## [34] crayon_1.3.4